natural language processing task
FRAGE: Frequency-Agnostic Word Representation
Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks. Although it is widely accepted that words with similar semantics should be close to each other in the embedding space, we find that word embeddings learned in several tasks are biased towards word frequency: the embeddings of high-frequency and low-frequency words lie in different subregions of the embedding space, and the embedding of a rare word and a popular word can be far from each other even if they are semantically similar. This makes learned word embeddings ineffective, especially for rare words, and consequently limits the performance of these neural network models. In order to mitigate the issue, in this paper, we propose a neat, simple yet effective adversarial training method to blur the boundary between the embeddings of high-frequency words and low-frequency words. We conducted comprehensive studies on ten datasets across four natural language processing tasks, including word similarity, language modeling, machine translation and text classification. Results show that we achieve higher performance than the baselines in all tasks.
Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention
A lack of corpora has so far limited advances in integrating human gaze data as a supervisory signal in neural attention mechanisms for natural language processing (NLP). We propose a novel hybrid text saliency model (TSM) that, for the first time, combines a cognitive model of reading with explicit human gaze supervision in a single machine learning framework. On four different corpora we demonstrate that our hybrid TSM duration predictions are highly correlated with human gaze ground truth. We further propose a novel joint modeling approach to integrate TSM predictions into the attention layer of a network designed for a specific upstream NLP task without the need for any task-specific human gaze data. We demonstrate that our joint model outperforms the state of the art in paraphrase generation on the Quora Question Pairs corpus by more than 10% in BLEU-4 and achieves state of the art performance for sentence compression on the challenging Google Sentence Compression corpus. As such, our work introduces a practical approach for bridging between data-driven and cognitive models and demonstrates a new way to integrate human gaze-guided neural attention into NLP tasks.
FRAGE: Frequency-Agnostic Word Representation
Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks. Although it is widely accepted that words with similar semantics should be close to each other in the embedding space, we find that word embeddings learned in several tasks are biased towards word frequency: the embeddings of high-frequency and low-frequency words lie in different subregions of the embedding space, and the embedding of a rare word and a popular word can be far from each other even if they are semantically similar. This makes learned word embeddings ineffective, especially for rare words, and consequently limits the performance of these neural network models. In order to mitigate the issue, in this paper, we propose a neat, simple yet effective adversarial training method to blur the boundary between the embeddings of high-frequency words and low-frequency words. We conducted comprehensive studies on ten datasets across four natural language processing tasks, including word similarity, language modeling, machine translation and text classification. Results show that we achieve higher performance than the baselines in all tasks.
Dynamic Temperature Scheduler for Knowledge Distillation
Islam, Sibgat Ul, Ahad, Jawad Ibn, Rahman, Fuad, Amin, Mohammad Ruhul, Mohammed, Nabeel, Rahman, Shafin
Knowledge Distillation (KD) trains a smaller student model using a large, pre-trained teacher model, with temperature as a key hyperparameter controlling the softness of output probabilities. Traditional methods use a fixed temperature throughout training, which is suboptimal. Moreover, architectural differences between teacher and student often result in mismatched logit magnitudes. We demonstrate that students benefit from softer probabilities early in training but require sharper probabilities in later stages. We introduce Dynamic Temperature Scheduler (DTS), which adjusts temperature dynamically based on the cross-entropy loss gap between teacher and student. To our knowledge, this is the first temperature scheduling method that adapts based on the divergence between teacher and student distributions. Our method integrates seamlessly with existing KD frameworks. We validate DTS across multiple KD strategies on vision (CIFAR-100, Tiny-ImageNet) and NLP tasks (GLUE, Dolly, SelfIns, UnNI, S-NI), consistently outperforming static-temperature baselines. Code is available at https://github.com/Sibgat-Ul/DTS.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California > Santa Clara County > Sunnyvale (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
Enhancing the Effectiveness and Durability of Backdoor Attacks in Federated Learning through Maximizing Task Distinction
Wang, Zhaoxin, Wang, Handing, Tian, Cong, Jin, Yaochu
Federated learning allows multiple participants to collaboratively train a central model without sharing their private data. However, this distributed nature also exposes new attack surfaces. In particular, backdoor attacks allow attackers to implant malicious behaviors into the global model while maintaining high accuracy on benign inputs. Existing attacks usually rely on fixed patterns or adversarial perturbations as triggers, which tightly couple the main and backdoor tasks. This coupling makes them vulnerable to dilution by honest updates and limits their persistence under federated defenses. In this work, we propose an approach to decouple the backdoor task from the main task by dynamically optimizing the backdoor trigger within a min-max framework. The inner layer maximizes the performance gap between poisoned and benign samples, ensuring that the contributions of benign users have minimal impact on the backdoor. The outer process injects the adaptive triggers into the local model. We evaluate our method on both computer vision and natural language tasks, and compare it with six backdoor attack methods under six defense algorithms. Experimental results show that our method achieves good attack performance and can be easily integrated into existing backdoor attack techniques.
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Europe > United Kingdom > England > Surrey > Guildford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
A Fine-Tuning Approach for T5 Using Knowledge Graphs to Address Complex Tasks
Liao, Xiaoxuan, Zhu, Binrong, He, Jacky, Liu, Guiran, Zheng, Hongye, Gao, Jia
With the development of deep learning technology, large language models have achieved remarkable results in many natural language processing tasks. However, these models still have certain limitations in handling complex reasoning tasks and understanding rich background knowledge. To solve this problem, this study proposed a T5 model fine-tuning method based on knowledge graphs, which enhances the model's reasoning ability and context understanding ability by introducing external knowledge graphs. We used the SQuAD1.1 dataset for experiments. The experimental results show that the T5 model based on knowledge graphs is significantly better than other baseline models in reasoning accuracy, context understanding, and the ability to handle complex problems. At the same time, we also explored the impact of knowledge graphs of different scales on model performance and found that as the scale of the knowledge graph increases, the performance of the model gradually improves. Especially when dealing with complex problems, the introduction of knowledge graphs greatly improves the reasoning ability of the T5 model. Ablation experiments further verify the importance of entity and relationship embedding in the model and prove that a complete knowledge graph is crucial to improving the various capabilities of the T5 model. In summary, this study provides an effective method to enhance the reasoning and understanding capabilities of large language models and provides new directions for future research.
- Banking & Finance (0.46)
- Health & Medicine (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Review for NeurIPS paper: Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention
Weaknesses: There are multiple issues with the claims and evaluations presented in the paper. In particular, as a reader, I am not convinced that reported gains are due to exploiting gaze information. An improvement over SOTA?: For paraphrasing task, the paper claims Patro et al. (2018) as SOTA which is an outdated baseline. Given that "No Fixation" method gives 27.81 BLEU-4 score with 69M params, I doubt that the proposed model's 28.82 BLEU-4 score with 79M is truly better than Patro et al. (2018)'s model. Ideally, authors should report the performance of baseline models using the same number of parameters.
Review for NeurIPS paper: Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention
Overall, the reviewers appreciated this paper and thought it was an interesting contribution to the literature. Because of this I am recommending that the paper be accepted. However, there was one major request from the Reviewer 3 (and me) that the authors appropriately frame their discussion of previous work, specifically [3]. Quoting reviewer 3's discussion directly: "The authors claim to be first to directly supervise attention; which has been done before with token-level annotation, and is exactly what [3] does with gaze. This false claim of novelty is problematic, but also unnecessary, since this is a great paper that already makes decent contributions, eg smart pretraining."
Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention
A lack of corpora has so far limited advances in integrating human gaze data as a supervisory signal in neural attention mechanisms for natural language processing (NLP). We propose a novel hybrid text saliency model (TSM) that, for the first time, combines a cognitive model of reading with explicit human gaze supervision in a single machine learning framework. On four different corpora we demonstrate that our hybrid TSM duration predictions are highly correlated with human gaze ground truth. We further propose a novel joint modeling approach to integrate TSM predictions into the attention layer of a network designed for a specific upstream NLP task without the need for any task-specific human gaze data. We demonstrate that our joint model outperforms the state of the art in paraphrase generation on the Quora Question Pairs corpus by more than 10% in BLEU-4 and achieves state of the art performance for sentence compression on the challenging Google Sentence Compression corpus.
Emotion Detection with Transformers: A Comparative Study
Analyzing the sentiment and emotion behind social media text data can help us understand the attitudes, preferences, and feelings of the users (Isah, Trundle, & Neagu, 2014). Unveiling these emotions goes beyond simply gauging sentiment, as it provides deeper insights into user motivations and psychological states. For instance, capturing the nuances of human emotion expressed through text remains a complex task due to the limitations of language itself. Sentiment analysis, while valuable, only provides a surface-level understanding. Identifying specific emotions expressed in text offers deeper insights into user behavior and motivations. Sentiment analysis is a natural language processing task that aims to classify the polarity of a text as positive, negative, or neutral (Khurana, Koli, Khatter, & Singh, 2023; Min et al., 2023). Emotion classification is a related task that aims to identify the specific emotion expressed in a text, such as sadness, joy, love, anger, fear, or surprise. Both tasks might be challenging due to the complexity and variability of natural language. Transfer learning is a technique that allows a model trained on one task to be used for another related task (Weiss, Khoshgoftaar, & Wang, 2016).
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.69)